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(54) Data reorganisation apparatus 

(57) Data reorganisation apparatus 
comprises a double buffer 
arrangement (20,21) in which data 
is written into each buffer by rows 
and is read out by columns. The 
inputs and outputs of the buffers 
are time-division multiplexed, which 
reduces the required width of each 
buffer by the product of the input 
and output multiplexing factors. 
The apparatus can be used for 
corner turning of image data e.g. 
receiving data in sub-frame order 
and reorganising it into scan-line 
order for display. 
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SPECIFICATION 

Data re-organisation apparatus 

5 This invention relates to data re-organisation apparatus. 5 
The invention is particularly although not exclusively concerned with re-organisation of image 
data. When processing image data, it is often convenient to divide each image frame into a 
number of sub-frames of a size more conveninent for processing. However, in order to display 
the data, it is necessary to output the data as a sequence of scan lines. This involves re- 
1 0 organising the data, since each sub-frame contains portions of a number of different scan lines 10 
and, conversely, each line is divided amoung a number of different sub-frames. 

This data re-organisation operation, for converting between the sub-frame order and the scan- 
line order, is sometimes referred to as corner turning since, as will be shown, it is equivalent to 
writing the data into a three-dimensional address space as a first set of parallel planes and then 
1 5 reading it from the address space as a second set of parallel planes at right angles to the first 1 5 
set. 

This corner-turning may be performed using a buffer store having a width equal to the 
product of the sizes of the input and output data words. (By the width of a store is meant the 
number of bit positions which can be accessed in parallel for reading or writing). For example, if 
20 the input and output data are both in the form of 32-bit words, then the corner-turning buffer 20 
would have a width equal to 32 X 32 = 1024 (1 K) bit positions. These 1 K bit positions are 
logically organised as a 32 X 32 array. Input data words are written into the rows of the array, 
and output data words are read out of the columns, to achieve the desired corner-turning. 

However, this method of corner-turning requires a very wide buffer store, which in turn 
25 requires a large number of memory components. For example, if 4-bit-wide RAM components 25 
are used, a total of 256 such components are required to provide a 1 K-bit wide store. 

The object of the present invention is to alleviate this problem so as to reduce the required 
number of memory components. 

30 Summary of the invention 30 
According to the invention there is provided data reorganisation apparatus comprising: 

(a) a buffer store having a width equal to p x q bit positions, these positions being logically 
arranged in rows and columns with p bits per row and q bits per column, 

(b) multiplexing means for receiving a succession of input data words each of n x p bits and 

35 converting these into a succession of p-bit groups at n times the clock rate of the input words, 35 

(c) input means for writing each p-bit group into a selected row of bit positions in the buffer 
store, 

(d) output means for reading a succession of q-bit groups from selected columns of bit 
positions in the buffer store, and 

40 (e) demultiplexing means for assembling the q-bit groups read from the buffer store into m x 40 
q-bit words at one mth the clock rate of the q-bit groups, wherein p,q,n and m are all integers 
greater than one. 

It can be seen that the apparatus in accordance with the invention handles input and output 
words of n x p and m x q bits respectively, using a buffer store which is only pxq bits wide. In 
45 comparison, the basic corner-turning arrangement described above would require a buffer store 45 
of width n x p x m x q. In other words, the invention reduces the required width of the buffer by 
a factor of n x m, with a corresponding saving in the number of components. 

This saving is achieved by increasing the clock rate at which the buffer operates relative to the 
input and output clock rates: the buffer must operate n times faster than the input data when 
50 writing to the buffer, and m times faster than the output data when reading. However, this is in 50 
general a favourable trade-off since the speed of the buffer increases only linearly with n (or m) 
whereas the width of the buffer decreases as the product n x m. 

For example, in a particular embodiment of the invention to be described below, the 
apparatus handles input and output data words of 32 bits, using a buffer store 64 bits wide; 
55 that is, p = q = 8 and n = m = 4. In this case, the width of the buffer store is reduced by a 55 
factor of 16 compared with the basic arrangement described above, whereas the speed of the 
buffer is increased by a factor of four. 

One data re-organisation apparatus in accordance with the invention will now be described by 
way of example with reference to the accompanying drawings. 
60 60 
Brief description of the drawings 

Figure 1 is a block diagram of apparatus for processing image data, including a data re- 
organisation unit in accordance with the invention. 
Figures 2, 3 and 4 show the data re-organisation unit in detail. 
65 Figure 5 is a schematic diagram showing the logical address space of the data re-organisation 65 
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unit. 

Figure 6 illustrates a modification of part of the re-organisation unit. 

Description of an embodiment of the invention 
5 Fig. 1 shows apparatus for processing image data. The apparatus includes an array processor 5 
10, consisting of 1024 processing elements (PE) connected together in rows and columns to 
form a 32 X 32 array. All the processing elements are operable in parallel, under control of a 
single stream of control signals from a common control unit (not shown). Each processing 
element contains a single-bit arithmetic and logic unit, and has a 16K X 1 bit memory. The 
10 memories in the array processor form a three-dimensional store, having 16K individually 10 
addressable planes, each plane consiting of an array of 32 X 32 bits, one in each PE. Any 
selected plane can be read out, over a 32-bit highway 11. 

Details of the array processor 1 0 form no part of the present invention and so will not be 
described further. The array processor 10 may, for example, be similar to that described in U.S. 
15 Patent No. 3, 979,728. 15 
Input data for the array processor 1 0 can be supplied by a video input device 1 2, such as a 
camera, and output data from the array processor can be fed to a video display device 1 3. 

The video input and output devices handle the image data in the form of a series of video 
frames. Each frame consists of 1024 horizontal scan lines, each line containing 1024 picture 
20 elements (pixels). Each pixel may be encoded as a single bit (for black-and-white images) or as a 20 
plurality of bits (for grey-scale or colour images). For simplicity, only the black-and-white case 
will be considered here; it will be appreciated by those skilled in the art that the invention is 
equally applicable to the processing of grey-scale or colour images. 
For the purpose of processing, each frame is divided into a plurality of sub-frames, each of 
25 which consists of an array of 32 X 32 pixels. Each of those sub-frames can therefore be mapped 25 
directly on to the 32 X 32 array of processing elements PE, with one pixel per processing 
element. Successive sub-frames are stored in successive memory planes in the array processor, 
allowing it to operate on any part of the image as required. 
Input data from the video input device 12 to the array processor 10, and output data from 
30 the processor to the video display device 1 3, pass through a data re-organisation unit 1 4. This 30 
re-organises the data as will be described so as to convert it between the scan-line format 
required by the video devices, and the sub-frame format required by the array processor. 

Data re-organisation unit 

35 Referring to Fig. 2, this shows the data re-organisation unit 1 4 in detail. 35 
The unit comprises two buffer stores 20,21 which are used alternately for reading and 

writing, so as to provide a double buffer arrangement. The buffers are controlled by a selection 

signal SEL so that when SEL= 1, buffer 20 is selected for writing and buffer 21 for reading, 

and when SEL = 0, buffer 20 is used for reading and buffer 21 for writing. 
40 Each buffer 20,21 consists of sixteen random-access memory (RAM) components 22. Each 40 

RAM 22 contains 512 individually addressable locations and has four bit positions, i.e. each 

location contains four bits which can be written or read in parallel. In other words, each RAM is 

four bits wide, and therefore each buffer 20,21 has an overall width of 16 X 4 = 64 bit 

positions. These 64 bit positions are logically organised as shown as square array having eight 
45 rows and eight columns. All the RAMs in the buffer 20 are addressed in parallel by a nine-bit 45 

address A0-A8 which selects one of the 51 2 locations in each RAM. Similarly, the buffer 21 is 

addressed by a nine-bit address A'0-A'8. 
The data re-organisation unit 1 4 receives input data words on a 32-bit wide path 23, from 

either the array processor 10 or the video input device 12. These words are multiplexed down 
50 to an 8-bit wide path 24, by means of a multiplexing switch 25. The path 24 therefore carries a 50 

stream of eight-bit bytes at a clock rate four times that of the input data words. This path is 

connected in parallel to both buffers 20,21. 
Buffer 20 has a decoder 26 which is enabled when SEL= 1, i.e. when this buffer is selected 

for writing. Similarly, buffer 21 has a decoder 27 which is enabled when SEL = 0. The currently 
55 enabled decoder 26 or 27 decodes three control bits WO, W1, W2 to produce a write enable 55 

signal which selects one row of bit positions in the associated buffer (e.g. the row indicated by 

X— X in Fig. 2). This causes the input data byte on path 24 to be written into the selected row. 
Referring now to Fig. 3, reading from the buffers 20,21 is controlled by three bits S0,S1,S2. 

Bit S2 is decoded along with the selection signal SEL in a decoder 30 to produce one of four 
60 output enable signals 0E1-0E4 as follows: 60 
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The signals 0E1 and 0E2 are connected to the output enable terminals of the two columns of 
RAMs in buffer 20, and the signals 0E3 and 0E4 are connected to the output enable terminals 
of the two columns of RAMs in buffer 21 . The data outputs of the RAMs are connected to eight 
4:1 switches 31 , controlled by the bits S0,S1 . These switches select one bit position from each 
15 RAM. 

Thus, it can be seen that SEL selects one of the buffers 20,21 for reading, S2 selects one 
column of RAMs within that buffer, and SO, S1 select one column of bit positions (such as that 
represented by X---X in Fig. 3) from the selected column of RAMs. The bits are read out on an 
8-bit output path 32. 

20 The path 32 is connected in parallel to the data inputs of four 8-bit registers 33. These 

registers are clocked in turn by signals from a decoder 34, so as to assemble each group of four 
successive bytes into a 32-bit word. In other words, the registers 33 demultiplex the data, 
converting it from a succession of 8-bit bytes into 32-bit words at one quarter of the clock rate 
of the bytes. The output of the registers 33 is fed either to the video display 1 3 or to the array 

25 processor 10. 

Referring now to Fig. 4, the buffers 20,21 are controlled by two 12-bit counters 40,41. The 
bits of each counter are numbered 0-11 where bit 0 is the least significant bit. 

Bits 2,3,4,5,6,0.1,10,1 1 of counter 40 supply a write address WA0-WA8, while bits 7,8,9 
supply the control signals W0,W1,W2. Similarly, bits 2,3,4,5,6,10,1 1,0,1 of counter 41 
30 supply a read address RA0-RA8, while bits 7,8,9 supply the control signals S0,S1,S2. 

The read and write addresses are connected to the inputs of a switching circuit 42, which is 
controlled by the signals SEL. When SEL= 1, the switching circuit takes the positions as shown, 
so that the address A0-A8 for buffer 20 is supplied by the write address WA0-WA8 while the 
address A'O-A'8 for buffer 21 is supplied by the read address RA0-RA8. When SEL = 0 the 
35 circuit 42 is switched over so that these connections are reversed. 

Bits WA5,WA6 also provide the control for the multiplexing switch 25 (Fig. 2) and bits 
RA7,RA8 provide the control for the demultiplexing registers 33 by way of decoder 34 (Fig. 3). 

The counter 40 is incremented by a clock signal C.IN which has a frequency equal to four 
times the input data word rate. Similarly, the counter 41 is incremented by a clock signal 
40 C.OUT at a frequency four times the desired output data word rate. 

When the counter 40 reaches its maximum count value (all ones) it stops and produces a 
signal FULL which indicates that the buffer which is currently being used for writing is now full. 
Similarly, when the counter 41 reaches its maximum count value, it stops and produces a signal 
EMPTY which indicates that the buffer which is currently being used for reading is now empty. 
45 When both these signals are true, an AND gate 43 is enabled, and this switches a bistable 

circuit 44 into its opposite state so as to complement the value of SEL. This reverses the roles of 
the two buffers so that the buffer which has just been written to is now selected of reading the 
vice versa. 

The AND gate 43 also produces a LOAD signal which causes preset values from two five-bit 
50 registers 45,46 to be loaded into bits 7-1 1 of the respective counters 40,41, the remaining ! 
bits 0-6 being reset to zero. These preset values allow the re-organisation unit to handle words 
of different sizes if required. For handling 32-bit input and output words, both the preset values 
are zero; for smaller word sizes, they are set to non-zero values. 

55 Operation i 
It can be seen that each buffer 20,21 contains a total of 32K bits (i.e. 16 RAMs each with 
512 X 4 bits). The bits are regarded as being logically arranged in a 32 X 32 X 32 cube as 
shown in Fig. 5. (This Figure relates to the buffer 20; buffer 21 is similar except that it has 
address bits A'O-A'8 instead of A0-A8). 

60 As shown, the x-dimension of this address space is addressed by bits S0,S1 ,S2,A5,A6, where 6 
bits A5,A6 specify one of four vertical layers, and bits S0,S1,S2 specify one vertical plane of 
bits within this layer. The y-dimension is addressed by bits A0-A4. The z-dimension is 
addressed by bits W0,W1,W2,A7,A8, where bits A7,A8 specify one of four horizontal layers, 
and bits W0.W1 ,W2 specify one horizontal bit plane within this layer. 

65 When writing data into buffer 20, each byte is written horizontally in this address space, 6 
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parallel to the x-axis, into a location specified by A0-A8 and W0-W2. As can be seen from Fig. 
4, when writing to the buffer 20, bits A5,A6 come from the least significant end of counter 40, 
bits A0-A4 from the middle, and bits W0,W1,W2,A7,A8 from the most significant end. Thus, 
the bits A5,A6 are incremented for each byte, so that successive bytes are written into 
5 successive byte locations along the direction of the x-axis. A complete 32-bit word is therefore 
written along a row parallel to the x-axis. The bits A0-A4 are incremented for each word, so 
that successive words are written into successive rows in the direction of the y-axis. A complete 
32 X 32 plane of data is therefore built up parallel to the x-y plane. Successive data planes are 
written in the direction of the z-axis, as the bits W0.W1 ,W2,A7,A8 are incremented. 

1 0 When reading from the buffer 20, each byte is written vertically, parallel to the z-axis, into a 
location specified by the bits A0-A8 and S0-S2. As seen from Fig. 4, when reading from the 
buffer, the bits A7,A8 are derived from the least significant end of the counter 41 , bits A0-A4 
from the middle, and bits S0,S1,S2,A5,A6 from the most significant end. Thus, the bits A7,A8 
are incremented for each byte, so that successive bytes are read from successive byte locations 

1 5 alongs the direction of the z-axis. A complete 32-bit word is therefore read from a column 
parallel to the z-axis. The bits A0-A4 are incremented for each word so that successive words 
are read out from successive columns in the direction of the y-axis. In this way, a complete 
plane of data parallel to the y-z plane is read out. Successive data planes in the direction of the 
x-axis are read out as the bits S0,S1,S2,A5 and A6 are incremented. 

20 In summary, data is written into the buffer as a sequence of planes parallel to the x-y plane, 
and is then read out as a sequence of planes parallel to the y-z plane (i.e. at right angles to the 
first planes). This enables the buffer to act as a corner-turning buffer for re-organising data. 

In the system shown in Fig. 1 , data from the array processor 1 0 is received by the buffer in 
sub-frame order, and successive sub-frames are therefore written into the buffer in successive 

25 x-y planes. When the buffer is full, it contains a complete row of sub-frames, consisting of 32 
complete scan lines. The data is then read out of successive y-z planes. Each of these planes 
contains the 1 024 bits making up a single scan line. Thus the output data is in the correct order 
for feeding to the video display 1 3. The operation of the buffer is similar for data passing 
between the video input device 1 2 and the array processor 1 0. 

30 

Variable sequence generator 

The arrangement described above may be modified by replacing the counters 40,41 and the 
switch 432 by a pair of variable address sequence generators, one for each buffer. Fig. 6 shows 
the generator for buffer 20; that for buffer 2 1 is identical except that it is controlled by the 

35 inverse of SEL, and produces the address bits A'O-A'8 instead of A0-A8. 

The variable sequence generator comprises a programmable read-only memory (PROM) 60 
and two counters 61,62 which produce two five-bit counts A and B. The PROM has 512 
individually addressable locations, each of which holds six bits, providing six output signals 
X,D,C,AE,BE and F. Bits C and D provide two single-bit counts which can be combined to act as 

40 a two-bit count. Bit X acts as the carry-out for the two-bit count. Bits AE and BE are connected 
to the enable inputs EN of the counters 61,62 so that whenever one of those bits is true the 
corresponding count A or B is incremented at the next clock beat. Bit F provides an output 
signal FINISH indicating the end of the address sequence. 
The sequence generator receives a 12-bit preset start address from a register 63. This controls 

45 the length of the generated address sequence, in the same way as registers 45,46 in Fig. 4. 
The generator also receives a 5-bit sequence number SEQ which selects a particular sequence. 

The PROM 60 is addressed by a nine-bit address. The first two bits C',D' of this address are 
supplied by a two-way switch 64 cntrolled by bit X. When X = 0, the switch is in the position 
shown and hence selects bits CD. When X = 1, the switch is set into the opposite position and 

50 therefore selects two preset bits from the register 63. The next two address bits are supplied by 
carry out signals AC.BC from the counters 62,62. The remaining five address bits are supplied 
by the sequence number SEQ. 

The carry-out signals AC,BC are also fed to the load terminals LD of the respective counters 
61,62 so that, whenever one of these counters overflows, it is reloaded with preset bits from 

55 the register 63. 

It can be seen that the sequence generator provides two five-bit counts A,B and two single bit 
counts C and D. By suitable programming the PROM 60, these four counts can be assembled in 
various diferent ways to form a single 1 2-bit count. For example, it may be desired to assemble 
the counts in the order A,D,C,B where A provides the least significant 5 bits of the 12-bit count 
60 and B provides the most significant five bits. 

This count sequence can be achieved by programming the first 1 6 locations of the PROM 60 
as shown in Table 1 below. 
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It can be seen that the output AE is always equal to 1. Hence, the counter 61 is always enabled 
so that count A is incremented at each clock beat. This is necessary since count A represents the 
least significant bits of the count sequence. 

When count A overflows, AC is true and it can be seen from Table I that this causes the value 
5 of D to reverse i.e. each location with AC = 1 has D equal to the complement of D'. Similarly, if 
both AC and D' are ture, then the value of C is reversed. The effect of this is to cause the two 
bits C,D to step through the count sequence 00,01,10,1 1; i.e. the bits C,D provide a two-bit 
count driven by the carry-out of count A. 

When AC, C and D' are all true, the output signal BE is produced, and this causes count B to 
10 be incremented. Also, the signal X is produced, which causes the signals C',D' to be selected 
from the preset inputs, rather than from C and D; this causes the count C,D to be re-initialised at 
the specified preset value. 

When AC,BC,C and D' are all true, the output signal F is produced, indicating the end of the 
sequence. 

1 5 Referring again to Fig. 6, this also shows the way in which the address bits A0-A8 for the 
buffer 20 are derived from the output of the sequence generator. Address bits A0-A4 are 
obtained from the counter 62. Address bits A5,A6 and A7,A8 are selected by switches 65,66, 
both of which are controlled by the signal SEL. When SEL = 1, the switches are set in the 
position shown, so that A5 and A6 are supplied by C and D, and A7,A8 are supplied by the two 

20 least significant bits of counter 61 . When SEL = 0, the switches 65,66 are set into the opposite 
position, so that A5,A6 now come from counter 61 and A7,A8 are supplied by C and D. The 
three most significant bits of counter 61 provide the bits W0,W1,W2 and S0,S1,S2. 

It will be appreciated that many other modifications to the system described above may be 
made without departing from the scope of the invention. For example, the buffers 20,21 may 

25 be organised at 1 6 X 4 arrays of bit positions instead of as 8 X 8 arrays. This would allow 
higher rates of data transfer into (or out of) the buffers than in the opposite direction. 

CLAIMS 

1 . Data reorganisation apparatus comprising: 

30 (a) a buffer store having a width equal to p x q bit positions, these positions being logically 
arranged in rows and columns with p bits per row and q bits per column, 

(b) multiplexing means for receiving a succession of input data words each of n x p bits and 
converting these into a succession of p-bit groups at n times the clock rate of the input words, 

(c) input means for writing each p-bit group into a selected row of bit positions in the buffer 
35 store, 

(d) output means for reading a succession of q-bit groups from selected columns of bit 
positions in the buffer store, and 

(e) demultiplexing means for assembling the q-bit groups read from the buffer store into m x 
q-bit words at one mth the clock rate of the q-bit groups, where p,q,n and m are all integers 

40 greater than one. 

2. Apparatus according to Claim 1 wherein the buffer store comprises a plurality of random- 
access memory (RAM) components each having a plurality of addressable locations and each 
location containing a plurality of bits which can be accessed in parallel, wherein the number of 
RAM components times the number of bits in each RAM location equals pxq. 

45 3. Apparatus according to Claim 2 wherein all the RAM components in the buffer are 

addressed in parallel so as to select a corresponding location in each RAM component. 
4. Apparatus according to Claim 3 including means for generating a write address, means 

for generating a read address, and switching means for selectively applying either the write 

address or the read address to the RAM components in the buffer. 
50 5. Apparatus according to Claim 4 wherein said multiplexing means is controlled by a 

predetermined portion of said write address. 

6. Apparatus according to Claim 4 or 5 wherein said demultiplexing means is controlled by 
a predetermined portion of said read address. 

7. Apparatus according to any one of Claims 4 to 6 wherein the means for generating the 
55 write address comprises a first counter, predetermined bits of which provide said write address, 

and further bits of which provide a control signal for selecting the row of bit positions into which 
the p-bit group is to be written. 

8. Apparatus according to Claim 7 wherein the means for generating the read address 
comprises a second counter, predetermined bits of which provide said read address, and further 

60 bits of which provide a control signal for selecting the column of bit positions from which the q- i 
bit group is to be read. 

9. Apparatus according to any preceding Claim including a second buffer store which 
operates in conjunction with the first-mentioned buffer store to provide a double buffer 
arrangement in which data is written into the first buffer while it is being read from the second 

65 and vice versa. i 
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10. Apparatus according to Claim 9 when dependent upon Claim 4 wherein said switching 
means is operable to apply the write address to either buffer store and to apply the read address 
to the other buffer store. 

1 1. Apparatus according to Claim 10 including means for operating the switching means so 

5 as to reverse the application of the read and write addresses to the buffer stores upon detecting 5 
that a predetermined number of p-bit groups has been written into the buffer store currently 
addressed by the write address and a predetermined number of q-foit groups has been read from 
the other buffer store. 

12. Data reorganisation apparatus substantially as hereinbefore described with reference to 

1 0 Figs. 2 to 4 of the accompanying drawings. 1 0 

1 3. Data reorganisation apparatus substantially as hereinbefore described with reference to 
Figs. 2,3 and 5 of the accompanying drawings. 

14. Image processing apparatus comprising 

(a) means for processing image data in sub-frame order, 
1 5 (b) means for displaying image data in scan-line order, and 1 5 

(c) data reorganisation apparatus according to any preceding claim, connected between the 
processing means and the displaying means, for converting between said sub-frame and said 
scan-line order for display. 

Printed in the United Kingdom for Her Majesty's Stationery Office. Dd 881893S, 1985, 4235. 

Published at The Patent Office, 25 Southampton Buildings, London. WC2A 1 AY, from which copies may be obtained. 
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Abstract ofGB2160685 

Data reorganisation apparatus comprises a 
double buffer arrangement (20,21) in which 
data is written into each buffer by rows and is 
read out by columns. The inputs and outputs of 
the buffers are time-division multiplexed, 
which reduces the required width of each 
buffer by the product of the input and output 
multiplexing factors. The apparatus can be 
used for corner turning of image data e.g. 
receiving data in sub-frame order and 
reorganising it into scan-line order for display. 
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Description of GB2160685 



SPECIFICATION 

Data re-organisation apparatus 

This invention relates to data re-organisation apparatus. 

The invention is particularly although not exclusively concerned with re-organisation of image data. Wl 
processing image data, it is often convenient to divide each image frame into a number of sub-frames of 
size more conveninent for processing. However, in order to display the data, it is necessary to output the 
data as a sequence of scan lines. This involves reorganising the data, since each sub-frame contains 
portions of a number of different scan lines and, conversely, each line is divided amoung a number of 
different sub-frames. 

This data re-organisation operation, for converting between the sub-frame order and the scan 
line order, is sometimes referred to as corner turning since, as will be shown, it is equivalent to writing t 
data into a three-dimensional address space as a first set of parallel planes and then reading it from the 
address space as a second set of parallel planes at right angles to the first set. 

This corner-turning may be performed using a buffer store having a width equal to the product of the si2 
of the input and output data words. (By the width of a store is meant the number of bit positions which c 
be accessed in parallel for reading or writing). For example, if the input and output data are both in the 
form of 32-bit words, then the corner-turning buffer would have a width equal to 32 X 32 = 1024 (1 K) 
positions. These 1 K bit positions are logically organised as a 32 X 32 array. Input data words are writte 
into the rows of the array, and output data words are read out of the columns, to achieve the desired con 
turning. 

However, this method of corner-turning requires a very wide buffer store, which in turn requires a large 
number of memory components. For example, if 4-bit-wide RAM components are used, a total of 256 si 
components are required to provide a 1 K-bit wide store. 

The object of the present invention is to alleviate this problem so as to reduce the required number of 
memory components. 

Summary of the invention 

According to the invention there is provided data reorganisation apparatus comprising: 

(a) a buffer store having a width equal to p x q bit positions, these positions being logically arranged in 
rows and columns with p bits per row and q bits per column, 

(b) multiplexing means for receiving a succession of input data words each of n x p bits and converting 
these into a succession of p-bit groups at n times the clock rate of the input words, 

(c) input means for writing each p-bit group into a selected row of bit positions in the buffer store, 

(d) output means for reading a succession of q-bit groups from selected columns of bit positions in the 
buffer store, and 

(e) demultiplexing means for assembling the q-bit groups read from the buffer store into m x q-bit word 
one mth the clock rate of the q-bit groups, wherein p,q,n and m are all integers greater than one. 

It can be seen that the apparatus in accordance with the invention handles input and output words of n x 
and m x q bits respectively, using a buffer store which is only p x q bits wide. In comparison, the basic 
corner-turning arrangement described above would require a buffer store of width n x p x m x q. In othe 
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words, the invention reduces the required width of the buffer by a factor ofnxm, with a corresponding 
saving in the number of components. 

This saving is achieved by increasing the clock rate at which the buffer operates relative to the input anc 
output clock rates: the buffer must operate n times faster than the input data when writing to the buffer, < 
m times faster than the output data when reading. However, this is in general a favourable trade-off sine 
the speed of the buffer increases only linearly with n (or m) whereas the width of the buffer decreases as 
the product n x m. 

For example, in a particular embodiment of the invention to be described below, the apparatus handles 
input and output data words of 32 bits, using a buffer store 64 bits wide; that is, p = q = 8andn = m = 4 
this case, the width of the buffer store is reduced by a factor of 16 compared with the basic arrangement 
described above, whereas the speed of the buffer is increased by a factor of four. 

One data re-organisation apparatus in accordance with the invention will now be described by way of 
example with reference to the accompanying drawings. 

Brief description of the drawings 

Figure 1 is a block diagram of apparatus for processing image data, including a data reorganisation unit 
accordance with the invention. 

Figures 2, 3 and 4 show the data re-organisation unit in detail. 

Figure 5 is a schematic diagram showing the logical address space of the data re-organisation unit. 
Figure 6 illustrates a modification of part of the re-organisation unit. 
Description of an embodiment of the invention 

Fig. 1 shows apparatus for processing image data. The apparatus includes an array processor 10, consist 
of 1024 processing elements (PE) connected together in rows and columns to form a 32 x 32 array. All 1 
processing elements are operable in parallel, under control of a single stream of control signals from a 
common control unit (not shown). Each processing element contains a single-bit arithmetic and logic vm 
and has al6K X 1 bit memory. The memories in the array processor form a three-dimensional store, 
havingl6K individually addressable planes, each plane consiting of an array of 32 X 32 bits, one in eacl 
PE. Any selected plane can be read out, over a 32-bit highway 1 1 . 

Details of the array processor 10 form no part of the present invention and so will not be described furth 
The array processor 10 may, for example, be similar to that described in U.S. 

Patent No. 3, 979,728. 

Input data for the array processor 10 can be supplied by a video input device 12, such as a camera, and 
output data from the array processor can be fed to a video display device 13. 

The video input and output devices handle the image data in the form of a series of video frames. Each 
frame consists of 1024 horizontal scan lines, each line containing 1024 picture elements (pixels). Each 
pixel may be encoded as a single bit (for black-and-white images) or as a plurality of bits (for grey-scak 
colour images). For simplicity, only the black-and-white case will be considered here; it will be 
appreciated by those skilled in the art that the invention is equally applicable to the processing of grey- 
scale or colour images. 
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For the purpose of processing, each frame is divided into a plurality of sub-frames, each of which consis 
of an array of 32 x 32 pixels. Each of those sub-frames can therefore be mapped directly on to the 32 X 
array of processing elements PE, with one pixel per processing element. Successive sub-frames are ston 
in successive memory planes in the array processor, allowing it to operate on any part of the image as 
required. 

Input data from the video input device 12 to the array processor 10, and output data from the processor 1 
the video display device 13, pass through a data re-organisation unit 14. This re-organises the data as wi 
be described so as to convert it between the scan-line format required by the video devices, and the sub- 
frame format required by the array processor. 

Data re-organisation unit 

Referring to Fig. 2, this shows the data re-organisation unit 14 in detail. 

The unit comprises two buffer stores 20,21 which are used alternately for reading and writing, so as to 
provide a double buffer arrangement. The buffers are controlled by a selection signal SEL so that when 
SEL = 1, buffer 20 is selected for writing and buffer 21 for reading, and when SEL = 0, buffer 20 is use< 
for reading and buffer 21 for writing. 

Each buffer 20,21 consists of sixteen random-access memory (RAM) components 22. Each 
RAM 22 contains 512 individually addressable locations and has four bit positions, i.e. each location 
contains four bits which can be written or read in parallel. In other words, each RAM is four bits wide, £ 
therefore each buffer 20,21 has an overall width of 16 x 4 = 64 bit positions. These 64 bit positions are 
logically organised as shown as square array having eight rows and eight columns. All the RAMs in the 
buffer 20 are addressed in parallel by a nine-bit address AO-A8 which selects one of the 512 locations h 
each RAM. Similarly, the buffer 21 is addressed by a nine-bit address A'O-A'8. 

The data re-organisation unit 14 receives input data words on a 32-bit wide path 23, from either the arra 
processor 10 or the video input device 12. These words are multiplexed down to an 8-bit wide path 24, 1 
means of a multiplexing switch 25. The path 24 therefore carries a stream of eight-bit bytes at a clock ra 
four times that of the input data words. This path is connected in parallel to both buffers 20,21 . 

Buffer 20 has a decoder 26 which is enabled when SEL = 1, i.e. when this buffer is selected for writing. 
Similarly, buffer 21 has a decoder 27 which is enabled when SEL = 0. The currently enabled decoder It 
27 decodes three control bits WO, Wl, W2 to produce a write enable signal which selects one row of bi 
positions in the associated buffer (e.g. the row indicated by 

X— X in Fig. 2). This causes the input data byte on path 24 to be written into the selected row. 

Referring now to Fig. 3, reading from the buffers 20,21 is controlled by three bitsSO,Sl,S2. 

Bit S2 is decoded along with the selection signal SEL in a decoder 30 to produce one of four output ena 

signals OE1-OE4 as follows: 

SEL S2 Output 0 10E1 0 10E2 1 0 OE3 1 1 OE4 

The signalsOEl andOE2 are connected to the output enable terminals of the two columns of 
RAMs in buffer 20, and the signalsOE3 andOE4 are connected to the output enable terminals of the two 
columns of RAMs in buffer 2 1 . The data outputs of the RAMs are connected to eight 4:1 switches 3 1 , 
controlled by the bits SO,S 1 . These switches select one bit position from eachRAM. 

Thus, it can be seen that SEL selects one of the buffers 20,21 for reading, S2 selects one column of RAJ 
within that buffer, and SO, SI select one column of bit positions (such as that represented by X— X in F 
3) from the selected column of RAMs. The bits are read out on an 8-bit output path 32. 
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The path 32 is connected in parallel to the data inputs of four 8-bit registers 33. These registers are clocl 
in turn by signals from a decoder 34, so as to assemble each group of four successive bytes into a 32-bit 
word. In other words, the registers 33 demultiplex the data, converting it from a succession of 8-bit byte 
into 32-bit words at one quarter of the clock rate of the bytes. The output of the registers 33 is fed either 
the video display 13 or to the array processor 10. 

Referring now to Fig. 4, the buffers 20,21 are controlled by two 12-bit counters 40,41. The bits of each 
counter are numbered 0-1 1 where bit 0 is the least significant bit. 

Bits2,3,4,5,6,0.1,10,l 1 of counter 40 supply a write address WAO-WA8, while bits 7,8,9 supply the con 
signals WO,Wl,W2. Similarly, bits2,3,4,5,6,10,l 1,0,1 of counter 41 supply a read addressRAO-RA8, wl 
bits 7,8,9 supply the control signalsSO,Sl,S2. 

The read and write addresses are connected to the inputs of a switching circuit 42, which is controlled b; 
the signals SEL. When SEL = 1, the switching circuit takes the positions as shown, so that the address P 
A8 for buffer 20 is supplied by the write address WAO-WA8 while the address A'O-A'8 for buffer 21 is 
supplied by the read address RAO-RA8. When SEL = 0 the circuit 42 is switched over so that these 
connections are reversed. 

Bits WA5,WA6 also provide the control for the multiplexing switch 25 (Fig. 2) and bits 
RA7,RA8 provide the control for the demultiplexing registers 33 by way of decoder 34 (Fig. 3). 

The counter 40 is incremented by a clock signal C.IN which has a frequency equal to four times the inpi 
data word rate. Similarly, the counter 41 is incremented by a clock signalC.OUT at a frequency four timi 
the desired output data word rate. 

When the counter 40 reaches its maximum count value (all ones) it stops and produces a signal FULL 
which indicates that the buffer which is currently being used for writing is now full. 

Similarly, when the counter 41 reaches its maximum count value, it stops and produces a signal 
EMPTY which indicates that the buffer which is currently being used for reading is now empty. 

When both these signals are true, an AND gate 43 is enabled, and this switches a bistable circuit 44 into 
opposite state so as to complement the value of SEL. This reverses the roles of the two buffers so that th 
buffer which has just been written to is now selected of reading the vice versa. 

The AND gate 43 also produces a LOAD signal which causes preset values from two five-bit registers 
45,46 to be loaded into bits 7-1 1 of the respective counters 40,41, the remaining bits 0-6 being reset to 
zero. These preset values allow the re-organisation unit to handle words of different sizes if required. Fc 
handling 32-bit input and output words, both the preset values are zero; for smaller word sizes, they are 
to non-zero values. 

Operation 

It can be seen that each buffer 20,21 contains a total of 32K bits (i.e. 16 RAMs each with 512x4 bits). ' 
bits are regarded as being logically arranged in a 32 X 32 X 32 cube as shown in Fig. 5. (This Figure 
relates to the buffer 20; buffer 21 is similar except that it has address bits A'O-A'8 instead of AO-A8). 

As shown, the x-dimension of this address space is addressed by bitsSO,Sl,S2,A5,A6, where bits A5,A( 
specify one of four vertical layers, and bits SO,Sl,S2 specify one vertical plane of bits within this layer. 
The y-dimension is addressed by bits AO-A4. The z-dimension is addressed by bits W0,W1,W2,A7,A8 
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where bits A7,A8 specify one of four horizontal layers, and bits WO, Wl ,W2 specify one horizontal bit 
plane within this layer. 

When writing data into buffer 20, each byte is written horizontally in this address space, parallel to the > 
axis, into a location specified by AO-A8 and WO-W2. As can be seen from Fig. 

4, when writing to the buffer 20, bits A5,A6 come from the least significant end of counter 40, bits AO- 
from the middle, and bits W0,W1,W2,A7,A8 from the most significant end. Thus, the bits A5,A6 are 
incremented for each byte, so that successive bytes are written into successive byte locations along the 
direction of the x-axis. A complete 32-bit word is therefore written along a row parallel to the x-axis. Tl 
bits AO-A4 are incremented for each word, so that successive words are written into successive rows in 
direction of the y-axis. A complete 32 X 32 plane of data is therefore built up parallel to the x-y plane. 
Successive data planes are written in the direction of the z-axis, as the bits W0,W1,W2,A7,A8 are 
incremented. 

When reading from the buffer 20, each byte is written vertically, parallel to the z-axis, into a location 
specified by the bits AO-A8 andSO-S2. As seen from Fig. 4, when reading from the buffer, the bits A7,. 
are derived from the least significant end of the counter 4 1 , bits AO-A4 from the middle, and bits 
S0,S1,S2,A5,A6 from the most significant end. Thus, the bits A7,A8 are incremented for each byte, so 
that successive bytes are read from successive byte locations alongs the direction of the z-axis. A compl 
32-bit word is therefore read from a column parallel to the z-axis. The bits AO-A4 are incremented for 
each word so that successive words are read out from successive columns in the direction of the y-axis. 
this way, a complete plane of data parallel to the y-z plane is read out.Successive data planes in the 
direction of the x-axis are read out as the bitsSO,Sl,S2,A5 and A6 are incremented. 

In summary, data is written into the buffer as a sequence of planes parallel to the x-y plane, and is then 
read out as a sequence of planes parallel to the y-z plane (i.e. at right angles to the first planes). This 
enables the buffer to act as a corner-turning buffer for re-organising data. 

In the system shown in Fig. 1, data from the array processor 10 is received by the buffer in sub-frame 
order, and successive sub-frames are therefore written into the buffer in successive x-y planes. When th< 
buffer is full, it contains a complete row of sub-frames, consisting of 32 complete scan lines. The data is 
then read out of successive y-z planes. Each of these planes contains the 1024 bits making up a single sc 
line. Thus the output data is in the correct order for feeding to the video display 13. The operation of the 
buffer is similar for data passing between the video input device 12 and the array processor 10. 

Variable sequence generator 

The arrangement described above may be modified by replacing the counters 40,41 and the switch 432 I 
a pair of variable address sequence generators, one for each buffer. Fig. 6 shows the generator for buffei 
20; that for buffer 21 is identical except that it is controlled by the inverse of SEL, and produces the 
address bits AO-A'8 instead of AO-A8. 

The variable sequence generator comprises a programmable read-only memory (PROM) 60 and two 
counters 61,62 which produce two five-bit counts A and B. The PROM has 512 individually addressable 
locations, each of which holds six bits, providing six output signals 

X,D,C,AE,BE and F. Bits C and D provide two single-bit counts which can be combined to act as a two 
count. Bit X acts as the carry-out for the two-bit count. Bits AE and BE are connected to the enable inpi 
EN of the counters 61,62 so that whenever one of those bits is true the corresponding count A or B is 
incremented at the next clock beat. Bit F provides an output signal FINISH indicating the end of the 
address sequence. 
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The sequence generator receives a 12-bit preset start address from a register 63. This controls the length 
the generated address sequence, in the same way as registers 45,46 in Fig. 4. 

The generator also receives a 5-bit sequence number SEQ which selects a particular sequence. 

The PROM 60 is addressed by a nine-bit address. The first two bits C',D' of this address are supplied by 
two-way switch 64 cntrolled by bit X. When X = 0, the switch is in the position shown and hence select 
bits C,D. When X = 1, the switch is set into the opposite position and therefore selects two preset bits fn 
the register 63. The next two address bits are supplied by carry out signalsAC, BC from the counters 62. 
The remaining five address bits are supplied by the sequence number SEQ. 

The carry-out signalsAC,BC are also fed to the load terminals LD of the respective counters 61,62 so th 
whenever one of these counters overflows, it is reloaded with preset bits from the register 63. 

It can be seen that the sequence generator provides two five-bit counts A,B and two single bit counts C ; 
D. By suitable programming the PROM 60, these four counts can be assembled in various diferent ways 
form a single 1 2-bit count. For example, it may be desired to assemble the counts in the order A,D,C,B 
where A provides the least significant 5 bits of the 1 2-bit count and B provides the most significant five 
bits. 

This count sequence can be achieved by programming the first 16 locations of the PROM 60 as shown i 
Table I below. 

TABLE I 
Inputs Outputs 

ACBCC'D'AEBECDXFOOOOIOOOOOOOOIIOOIOOOOIOIOIOOOOOIIIOIIO' 
OiOOlOOOOOOlOl 10010001 10101000011 1 101 100100010010010011' 
0001010101 100101 1 1 100101 1001001001 101 1100001 1 10101 1001 1 1 
110011 

It can be seen that the output AE is always equal to 1. Hence, the counter 61 -is always enabled so that 
count A is incremented at each clock beat. This is necessary since count A represents the least significai 
bits of the count sequence. 

When count A overflows, AC is true and it can be seen from Table I that this causes the value of D to 
reverse i.e. each location with AC = 1 has D equal to the complement of D'. Similarly, if both AC and E 
are ture, then the value of C is reversed. The effect of this is to cause the two bits C,D to step through th 
count sequence00,0 1,10,1 1; i.e. the bits C,D provide a two-bit count driven by the carry-out of count A. 

When AC, C and D' are all true, the output signal BE is produced, and this causes count B to be 
incremented. Also, the signal X is produced, which causes the signals C',D' to be selected from the prest 
inputs, rather than from C and D; this causes the count C,D to be re-initialised at the specified preset val 

When AC,BC,C and D' are all true, the output signal F is produced, indicating the end of the sequence. 

Referring again to Fig. 6, this also shows the way in which the address bits AO-A8 for the buffer 20 are 
derived from the output of the sequence generator. Address bits AO-A4 are obtained from the counter 6 
Address bits A5,A6 and A7,A8 are selected by switches 65,66, both of which are controlled by the sign; 
SEL. When SEL = 1, the switches are set in the position shown, so that AS and A6 are supplied by C an 
D, and A7,A8 are supplied by the two least significant bits of counter 61. When SEL = 0, the switches 
65,66 are set into the opposite position, so that A5,A6 now come from counter 61 and A7,A8 are suppli< 
by C and D. The three most significant bits of counter 61 provide the bitsWO,Wl ,W2 andSO,Sl ,S2. 
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It will be appreciated that many other modifications to the system described above may be made withou 
departing from the scope of the invention. For example, the buffers 20,21 may be organised at 16 X 4 
arrays of bit positions instead of as 8 X 8 arrays. This would allow higher rates of data transfer into (or ( 
of) the buffers than in the opposite direction. 
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CLAIMS 

1 . Data reorganisation apparatus comprising: 

(a) a buffer store having a width equal to p x q bit positions, these positions being logically arranged in 
rows and columns with p bits per row and q bits per column, 

(b) multiplexing means for receiving a succession of input data words each of n x p bits and converting 
these into a succession of p-bit groups at n times the clock rate of the input words, 

(c) input means for writing each p-bit group into a selected row of bit positions in the buffer store, 

(d) output means for reading a succession of q-bit groups from selected columns of bit positions in the 
buffer store, and 

(e) demultiplexing means for assembling the q-bit groups read from the buffer store into m x q-bit word 
one mth the clock rate of the q-bit groups, where p,q,n and m are all integers greater than one. 

2. Apparatus according to Claim 1 wherein the buffer store comprises a plurality of randomaccess mem< 
(RAM) components each having a plurality of addressable locations and each location containing a 
plurality of bits which can be accessed in parallel, wherein the number of 

RAM components times the number of bits in each RAM location equals pxq. 

3. Apparatus according to Claim 2 wherein all the RAM components in the buffer are addressed in para] 
so as to select a corresponding location in each RAM component. 

4. Apparatus according to Claim 3 including means for generating a write address, means for generating 
read address, and switching means for selectively applying either the write address or the read address t< 
the RAM components in the buffer. 

5. Apparatus according to Claim 4 wherein said multiplexing means is controlled by a predetermined 
portion of said write address. 

6. Apparatus according to Claim 4 or 5 wherein said demultiplexing means is controlled by a 
predetermined portion of said read address. 

7. Apparatus according to any one of Claims 4 to 6 wherein the means for generating the write address 
comprises a first counter, predetermined bits of which provide said write address, and further bits of wh 
provide a control signal for selecting the row of bit positions into which the p-bit group is to be written. 

8. Apparatus according to Claim 7 wherein the means for generating the read address comprises a secon 
counter, predetermined bits of which provide said read address, and further bits of which provide a cont: 
signal for selecting the column of bit positions from which the qbit group is to be read. 
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9. Apparatus according to any preceding Claim including a second buffer store which operates in 
conjunction with the first-mentioned buffer store to provide a double buffer arrangement in which data i 
written into the first buffer while it is being read from the second and vice versa. 

10. Apparatus according to Claim 9 when dependent upon Claim 4 wherein said switching means is 
operable to apply the write address to either buffer store and to apply the read address to the other buffei 
store. 

11. Apparatus according to Claim 10 including means for operating the switching means so as to revers< 
the application of the read and write addresses to the buffer stores upon detecting that a predetermined 
number of p-bit groups has been written into the buffer store currently addressed by the write address ar 
predetermined number of q-bit groups has been read from the other buffer store. 

12. Data reorganisation apparatus substantially as hereinbefore described with reference to 
Figs. 2 to 4 of the accompanying drawings. 

13. Data reorganisation apparatus substantially as hereinbefore described with reference to 
Figs. 2,3 and 5 of the accompanying drawings. 

14. Image processing apparatus comprising 

(a) means for processing image data in sub-frame order, 

(b) means for displaying image data in scan-line order, and 

(c) data reorganisation apparatus according to any preceding claim, connected between the processing 
means and the displaying means, for converting between said sub-frame and said scan-line order for 
display. 
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